Enable training strategy for Indexer by RissyRan · Pull Request #3415 · AI-Hypercomputer/maxtext

RissyRan · 2026-03-14T00:03:47Z

Description

Enable selective parameter training strategy for DeeSeek V3.2 Indexer - paper

Dense warm up stage:
- Add trainable_parameters_mask flag, allowing specific parameters to be targeted for training while freezing the rest of the model.
- Add TrainableParametersMaskTest unit tests for validation.
Sparse training stage:
- Update indexer_sparse_training flag to indicate Dense Warm-up stage or Sparse Training stage for DS v3.2.
- Add test_indexer_gradients unit test to verify proper gradient isolation.
Renaming flags to avoid confusion
- use_sparse_indexer --> use_indexer; index_head_dim --> indexer_head_dim; index_n_heads --> indexer_n_heads, and index_topk --> indexer_topk

Tests

Expect added unit tests are all green
End-to-end functional - logs
Sanity check deepseek32_vs_reference_test - link, same as b/491486716

Checklist

Before submitting this PR, please make sure (put X in square brackets):

I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
I have necessary comments in my code, particularly in hard-to-understand areas.
I have run end-to-end tests tests and provided workload links above if applicable.
I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

codecov · 2026-03-14T00:08:56Z

Codecov Report

❌ Patch coverage is 75.40984% with 15 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/maxtext/trainers/pre_train/train.py	25.00%	11 Missing and 1 partial ⚠️
src/maxtext/optimizers/optimizers.py	89.47%	2 Missing ⚠️
src/maxtext/layers/attention_op.py	0.00%	0 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

github-actions · 2026-03-14T02:17:53Z

🤖 Hi @RissyRan, I've received your request, and I'm working on it now! You can track my progress in the logs for more details.

github-actions

## 📋 Review Summary

This PR enables the selective parameter training strategy (dense warm-up and sparse training stages) for the DeepSeek V3.2 Indexer. It refactors parameter freezing flags and adds tests to verify proper isolation of indexer gradients from the rest of the model.

🔍 General Feedback

Memory Optimization in Selective Training: The current implementation of optimizer masking computes and stores Adam state parameters for the entire model before zeroing out the updates. I've suggested an explicit mapping with optax.multi_transform to avoid allocating massive memory blocks for frozen parameter states, which is critical for 671B model scaling.
Gradient Isolation in KL Divergence: I left an inline comment pointing out a gradient leak when calculating the KL divergence in calculate_indexer_loss. Ensure jax.lax.stop_gradient is applied to the target attention_probs distribution, so that the main model's queries and keys do not get updated by the indexer's loss.

shuningjin

Thanks! Sparse training looks good to me. I left a suggestion on dense warmup, along with minor comments. Will take another look at trainable_parameters_mask soon.

shuningjin

Looks great! Thanks for the change.

Rohan-Bierneni

Thank you for the changes I have left a small nit.

RissyRan force-pushed the indexer_train_strategy branch from 27eb0b4 to a99586d Compare March 14, 2026 00:21

RissyRan marked this pull request as ready for review March 14, 2026 00:36

RissyRan force-pushed the indexer_train_strategy branch 4 times, most recently from a81b8fd to 85e4e0d Compare March 14, 2026 02:11

RissyRan assigned shuningjin and Rohan-Bierneni Mar 14, 2026

RissyRan added the gemini-review label Mar 14, 2026

RissyRan changed the title ~~Enable selective parameter training strategy~~ Enable training strategy for Indexer Mar 14, 2026

RissyRan force-pushed the indexer_train_strategy branch from 85e4e0d to b0f353b Compare March 14, 2026 02:20

github-actions Bot reviewed Mar 14, 2026

View reviewed changes

Comment thread src/maxtext/optimizers/optimizers.py

Comment thread src/maxtext/layers/attention_mla.py Outdated

RissyRan force-pushed the indexer_train_strategy branch 4 times, most recently from 4ec8a1e to 906f12f Compare March 14, 2026 03:18

shuningjin reviewed Mar 19, 2026

View reviewed changes

RissyRan force-pushed the indexer_train_strategy branch 8 times, most recently from be3bad4 to d3defd7 Compare March 20, 2026 05:07

shuningjin approved these changes Mar 20, 2026

View reviewed changes

Comment thread tests/unit/deepseek32_vs_reference_test.py Outdated

RissyRan force-pushed the indexer_train_strategy branch from d3defd7 to 4b251a3 Compare March 20, 2026 17:37

RissyRan unassigned shuningjin Mar 20, 2026

Rohan-Bierneni approved these changes Mar 20, 2026

View reviewed changes

Comment thread src/maxtext/optimizers/optimizers.py Outdated

RissyRan force-pushed the indexer_train_strategy branch 2 times, most recently from f27e777 to fef4e96 Compare March 20, 2026 20:34

Enable training strategy for Indexer

0b55a28

RissyRan force-pushed the indexer_train_strategy branch from fef4e96 to 0b55a28 Compare March 20, 2026 20:36

RissyRan unassigned Rohan-Bierneni Mar 20, 2026

RissyRan added the pull ready label Mar 20, 2026

copybara-service Bot merged commit cc0d3ae into main Mar 20, 2026
31 checks passed

copybara-service Bot deleted the indexer_train_strategy branch March 20, 2026 22:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable training strategy for Indexer#3415

Enable training strategy for Indexer#3415
copybara-service[bot] merged 1 commit intomainfrom
indexer_train_strategy

RissyRan commented Mar 14, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Mar 14, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Mar 14, 2026

Uh oh!

github-actions Bot left a comment

Uh oh!

Uh oh!

Uh oh!

shuningjin left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

shuningjin left a comment

Uh oh!

Uh oh!

Rohan-Bierneni left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

RissyRan commented Mar 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Tests

Checklist

Uh oh!

codecov Bot commented Mar 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

github-actions Bot commented Mar 14, 2026

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

🔍 General Feedback

Uh oh!

Uh oh!

Uh oh!

shuningjin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

shuningjin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Rohan-Bierneni left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

RissyRan commented Mar 14, 2026 •

edited

Loading

codecov Bot commented Mar 14, 2026 •

edited

Loading